Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Use mapping format for reading date values #169

Conversation

Yury-Fridlyand
Copy link

@Yury-Fridlyand Yury-Fridlyand commented Nov 16, 2022

Signed-off-by: Yury-Fridlyand yuryf@bitquilltech.com

Description

The fix is to collect formats from mapping and pass it to OpenSearchExprValueFactory.
I added mapping2 member to IndexMapping and OpenSearchExprValueFactory which hold required data. The original mapping is not removed yet to ensure that everything works as was before.

Test data

date_formats.json.txt
date_formats_mappings.json.txt

curl -s -XDELETE "http://localhost:9200/date-formats?pretty"
curl -s -H 'Content-Type: application/json' -XPUT "http://localhost:9200/date-formats?pretty" --data-binary @date_formats_mappings.json
curl -s -H 'Content-Type: application/json' -XPOST "http://localhost:9200/date-formats/_bulk?pretty" --data-binary @date_formats.json

Test queries

select * from date-formats;
select <format> from date-formats where name = '<format>';
search source=date-formats;
search source=date-formats | where name='<format>' | fields <format>;

For example, epoch_second (was reported on forum)

opensearchsql> select epoch_second from date-formats where name = 'epoch_second';
fetched rows / total rows = 1/1
+-------------------------------+
| epoch_second                  |
|-------------------------------|
| 1984-04-12 09:07:42.000123456 |
+-------------------------------+

Unsupported formats

weekyear
strict_weekyear
weekyear_year
strict_weekyear
year
strict_year
year_month
strict_year_month

(see links below)
They fall back to error:

// TODO throw exception
LogManager.getLogger(OpenSearchExprValueFactory.class).error(
String.format("Can't recognize parsed value: %s, %s", parsed, parsed.getClass()));
return new ExprStringValue(value.stringValue());

Time interpreting issue

return new ExprTimestampValue(new ExprTimeValue(LocalTime.from(parsed)).timestampValue());

goes to

There is no option to access queryContext there.
This could be fixed by one of

  • Returning ExprTimeValue
  • Applying time value on top of Epoch or any other fixed date. Note - this should be consistent across the plugin.

TBD

  • What to do with LocalTime?
  • What to do with unsupported formats?
  • Can we return not ExprTimestampValue for date and time columns? Note: ensure that entire column has the same or similar format(s).

TODO

Links

Issues Resolved

opensearch-project#794
https://forum.opensearch.org/t/sql-select-fails-on-date-fields-format-epoch-second/11521

Check List

  • New functionality includes testing.
    • All tests pass, including unit test, integration test and doctest
  • New functionality has been documented.
    • New functionality has javadoc added
    • New functionality has user manual doc added
  • Commits are signed per the DCO using --signoff

By submitting this pull request, I confirm that my contribution is made under the terms of the Apache 2.0 license.
For more information on following Developer Certificate of Origin and signing off your commits, please check here.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
… update tests.

Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
Signed-off-by: Yury-Fridlyand <yuryf@bitquilltech.com>
@@ -57,6 +57,21 @@ public LocalTime timeValue() {
return time;
}

@Override
public LocalDate dateValue() {
return LocalDate.now();

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What was this required for?

It would need to account for query start time.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  1. We return TIMESTAMP for OpenSearch date type => we return ExprTimestampValue
  2. An index can store time in date field => parser returns LocalTime

To satisfy both 1 and 2 we have to build ExprTimestampValue from LocalTime which calls to LocalDate.now().
I see few ways to fix/avoid this:

  1. Don't return ExprTimestampValue where it is possible. It is nice to have, but still we would have to build ExprTimestampValue from LocalTime.
  2. Change how we build ExprTimestampValue from LocalTime - use a fixed date (e.g. Epoch) instead of today.
  3. Deliver queryContext to OpenSearchExprValueFactory in opensearch module.

@Yury-Fridlyand
Copy link
Author

@penghuo, @dai-chen, please have a look on this PoC. Your feedback and opinion is highly appreciated.

Comment on lines +91 to +92
private final Map<ExprType, BiFunction<Content, MappingEntry, ExprValue>> typeActionMap =
new ImmutableMap.Builder<ExprType, BiFunction<Content, MappingEntry, ExprValue>>()

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Because MappingEntry is only used by date time, I'm think can we add this mapping info to Content?

Comment on lines +20 to +21
@AllArgsConstructor
public class MappingEntry {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We may need more thoughts about this new abstraction. For now it seems only for OpenSearch date field mapping. Can we extend it to carry mapping info for all OpenSearch fields? For example, multi-field may use this abstraction to get inner field name and type.

{
  "mappings": {
    "properties": {
      "city": {
        "type": "text",
        "fields": {
          "raw": { 
            "type":  "keyword"
          }
        }
      }
    }
  }
}

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, current implementation doesn't work with keyword and nested types.

Comment on lines +114 to +115
namedAggregatorList.forEach(agg -> builder.put(agg.getName(), new MappingEntry(null, null, agg.type())));
groupByList.forEach(group -> builder.put(group.getNameOrAlias(), new MappingEntry(null, null, group.type())));
Copy link

@dai-chen dai-chen Nov 23, 2022

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm thinking should we refactor OpenSearchDataType? Rather than static Enum, we instantiate it with mapping info for each field during analyzing.

class OpenSearchDataType implements ExprType:

  val mappingInfo;

  val osTypeName;

  ...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@dai-chen,
Please see changes in #180. They don't contain date format processing, but only refactor of OpenSearchDataType.

@Yury-Fridlyand
Copy link
Author

Superseded by #273

@Yury-Fridlyand Yury-Fridlyand deleted the dev-spike-use-mapping-ts-format branch June 29, 2023 23:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants